• OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA 

      López Castro, Roberto; Andrade, Diego; Fraguela, Basilio B. (MDPI, 2021)
      [Abstract] Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The ...
    • Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs 

      López Castro, Roberto; Andrade, Diego; Fraguela, Basilio B. (Institute of Electrical and Electronics Engineers, 2022)
      [Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs ...
    • Using Artificial Vision Techniques for Individual Player Tracking in Sport Events 

      López Castro, Roberto; Andrade, Diego (M D P I AG, 2019-07-31)
      [Abstract] We introduce a hybrid approach that can track an individual football player in a video sequence. This solution achieves a good balance between speed and accuracy, combining traditional object tracking techniques ...
    • VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores 

      López Castro, Roberto; Ivanov, Andrei; Andrade, Diego; Ben-Nun, Tal; Fraguela, Basilio B.; Hoefler, Torsten (Association for Computing Machinery, 2023-11)
      [Abstract]: The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated ...